基于CNN的面部识别模型带来了显着的性能改善,但它们容易受到对抗的扰动。最近的研究表明,即使只能访问模型的硬盘标签输出,对手也可以欺骗模型。然而,由于需要许多查询来寻找不可察觉的对抗性噪声,因此减少查询的数量对于这些攻击至关重要。在本文中,我们指出了现有的基于决策黑匣子攻击的两个限制。我们观察到它们浪费查询以进行背景噪声优化,并且他们不利用为其他图像产生的对抗扰动。我们利用3D面部对齐以克服这些限制,并提出了一种关于对地形识别的查询有效的黑匣子攻击的一般策略,名为几何自适应词典攻击(GADA)。我们的核心思想是在UV纹理地图中创造一个对抗扰动,并将其投影到图像中的脸上。通过将扰动搜索空间限制到面部区域并有效地回收之前的扰动来大大提高查询效率。我们将GADA策略应用于两个现有的攻击方法,并在LFW和CPLFW数据集的实验中显示出压倒性的性能改进。此外,我们还提出了一种新的攻击策略,可以规避基于类似性的有状态检测,该检测标识了基于查询的黑盒攻击过程。
translated by 谷歌翻译
虽然深度神经网络在各种任务中表现出前所未有的性能,但对对抗性示例的脆弱性阻碍了他们在安全关键系统中的部署。许多研究表明,即使在黑盒设置中也可能攻击,其中攻击者无法访问目标模型的内部信息。大多数黑匣子攻击基于查询,每个都可以获得目标模型的输入输出,并且许多研究侧重于减少所需查询的数量。在本文中,我们注意了目标模型的输出完全对应于查询输入的隐含假设。如果将某些随机性引入模型中,它可以打破假设,因此,基于查询的攻击可能在梯度估计和本地搜索中具有巨大的困难,这是其攻击过程的核心。从这种动机来看,我们甚至观察到一个小的添加剂输入噪声可以中和大多数基于查询的攻击和名称这个简单但有效的方法小噪声防御(SND)。我们分析了SND如何防御基于查询的黑匣子攻击,并展示其与CIFAR-10和ImageNet数据集的八种最先进的攻击有效性。即使具有强大的防御能力,SND几乎保持了原始的分类准确性和计算速度。通过在推断下仅添加一行代码,SND很容易适用于预先训练的模型。
translated by 谷歌翻译
In this paper, we propose a diffusion-based face swapping framework for the first time, called DiffFace, composed of training ID conditional DDPM, sampling with facial guidance, and a target-preserving blending. In specific, in the training process, the ID conditional DDPM is trained to generate face images with the desired identity. In the sampling process, we use the off-the-shelf facial expert models to make the model transfer source identity while preserving target attributes faithfully. During this process, to preserve the background of the target image and obtain the desired face swapping result, we additionally propose a target-preserving blending strategy. It helps our model to keep the attributes of the target face from noise while transferring the source facial identity. In addition, without any re-training, our model can flexibly apply additional facial guidance and adaptively control the ID-attributes trade-off to achieve the desired results. To the best of our knowledge, this is the first approach that applies the diffusion model in face swapping task. Compared with previous GAN-based approaches, by taking advantage of the diffusion model for the face swapping task, DiffFace achieves better benefits such as training stability, high fidelity, diversity of the samples, and controllability. Extensive experiments show that our DiffFace is comparable or superior to the state-of-the-art methods on several standard face swapping benchmarks.
translated by 谷歌翻译
In recent years, generative models have undergone significant advancement due to the success of diffusion models. The success of these models is often attributed to their use of guidance techniques, such as classifier and classifier-free methods, which provides effective mechanisms to trade-off between fidelity and diversity. However, these methods are not capable of guiding a generated image to be aware of its geometric configuration, e.g., depth, which hinders the application of diffusion models to areas that require a certain level of depth awareness. To address this limitation, we propose a novel guidance approach for diffusion models that uses estimated depth information derived from the rich intermediate representations of diffusion models. To do this, we first present a label-efficient depth estimation framework using the internal representations of diffusion models. At the sampling phase, we utilize two guidance techniques to self-condition the generated image using the estimated depth map, the first of which uses pseudo-labeling, and the subsequent one uses a depth-domain diffusion prior. Experiments and extensive ablation studies demonstrate the effectiveness of our method in guiding the diffusion models toward geometrically plausible image generation. Project page is available at https://ku-cvlab.github.io/DAG/.
translated by 谷歌翻译
The standard empirical risk minimization (ERM) can underperform on certain minority groups (i.e., waterbirds in lands or landbirds in water) due to the spurious correlation between the input and its label. Several studies have improved the worst-group accuracy by focusing on the high-loss samples. The hypothesis behind this is that such high-loss samples are \textit{spurious-cue-free} (SCF) samples. However, these approaches can be problematic since the high-loss samples may also be samples with noisy labels in the real-world scenarios. To resolve this issue, we utilize the predictive uncertainty of a model to improve the worst-group accuracy under noisy labels. To motivate this, we theoretically show that the high-uncertainty samples are the SCF samples in the binary classification problem. This theoretical result implies that the predictive uncertainty is an adequate indicator to identify SCF samples in a noisy label setting. Motivated from this, we propose a novel ENtropy based Debiasing (END) framework that prevents models from learning the spurious cues while being robust to the noisy labels. In the END framework, we first train the \textit{identification model} to obtain the SCF samples from a training set using its predictive uncertainty. Then, another model is trained on the dataset augmented with an oversampled SCF set. The experimental results show that our END framework outperforms other strong baselines on several real-world benchmarks that consider both the noisy labels and the spurious-cues.
translated by 谷歌翻译
卷积神经网络(CNN)成为计算机视觉最受欢迎和最突出的深度学习体系结构之一,但其黑匣子功能隐藏了内部预测过程。因此,AI从业者阐明了可解释的AI,以提供模型行为的解释性。特别是,基于类的激活图(CAM)和基于GRAD-CAM的方法已显示出希望结果,但它们具有架构限制或梯度计算负担。为了解决这些问题,已建议将得分摄像机作为一种无梯度方法,但是,与基于CAM或GRAD-CAM的方法相比,它需要更多的执行时间。因此,我们通过空间掩盖提取的特征图来利用激活图和网络输出之间的相关性,提出了一个轻巧的体系结构和无梯度的互惠凸轮(配克CAM)。通过提出的方法,与平均跌落 - 相干 - 复杂性(ADCC)度量相比,Resnet家族中的1:78-3:72%的收益不包括VGG-16(1:39%)(1:39%) )。此外,配置摄像头表现出与Grad-CAM相似的显着性图生成速率,并且比Score-CAM快于148倍。
translated by 谷歌翻译
我们提出了一种基于示例的图像翻译的新方法,称为匹配交织的扩散模型(MIDMS)。该任务的大多数现有方法都是基于GAN的匹配,然后代表了代代框架。但是,在此框架中,跨跨域的语义匹配难度引起的匹配误差,例如草图和照片,可以很容易地传播到生成步骤,从而导致结果退化。由于扩散模型的最新成功激发了克服GAN的缺点,我们结合了扩散模型以克服这些局限性。具体而言,我们制定了一个基于扩散的匹配和生成框架,该框架通过将中间扭曲馈入尖锐的过程并将其变形以生成翻译的图像,从而交织了潜在空间中的跨域匹配和扩散步骤。此外,为了提高扩散过程的可靠性,我们使用周期一致性设计了一种置信度的过程,以在翻译过程中仅考虑自信区域。实验结果表明,我们的MIDM比最新方法产生的图像更合理。
translated by 谷歌翻译
多模式的机器学习已被广​​泛研究以开发通用智能。最近,感知者和感知者IO出色的多模式算法对各种数据集域和任务显示了竞争结果。但是,最近的作品,感知者和感知者IO专注于异质模式,包括图像,文本和语音,并且对于图形结构化数据集的研究作品很少。图是最概括的数据集结构之一,我们可以代表其他数据集,包括图像,文本和语音作为图形结构化数据。图具有与其他数据集域(例如文本和图像)不同的邻接矩阵,并且处理拓扑信息,关系信息和规范的位置信息并不微不足道。在这项研究中,我们提供了图形感知器IO,即图形结构化数据集的感知器IO。我们将图形感知器IO的主要结构保留为感知器IO,因为除了图形结构化数据集外,感知器IO已经很好地处理了各种数据集。图形感知器IO是一种通用方法,它可以处理各种数据集,例如图形结构化数据以及文本和图像。比较图形神经网络,图感知器IO需要较低的复杂性,并且可以有效地合并局部和全局信息。我们表明,图形感知器IO显示了与图形相关任务的各种竞争结果,包括节点分类,图形分类和链接预测。
translated by 谷歌翻译
随着数字化的传统文化遗产文件迅速增加,导致对保存和管理的需求增加,对实体的实际认可和阶级的典型认识已成为必不可少的。为了实现这一目标,我们提出了Kochet - 韩国文化遗产语料库,用于典型实体相关的任务,即指定的实体识别(NER),关系提取(RE)和实体键入(ET)。根据政府附属组织的数据构建指南的文化遗产专家的建议,科切特分别由NER,RE和ET任务的112,362、38,765、113,198个示例组成,涵盖了与韩国文化遗产有关的所有实体类型。此外,与现有的公共语料库不同,可以允许经过修改的重新分配。我们的实验结果使Kochet的实际可用性在文化遗产方面更有价值。我们还从统计和语言分析方面提供了Kochet的实际见解。我们的语料库可以在https://github.com/gyeeongmin47/kochet上免费获得。
translated by 谷歌翻译
学习平均回报或价值功能的预测模型在许多强化学习算法中起着至关重要的作用。相反,分布强化学习(DRL)方法对价值分布进行了建模,该价值分布已被证明可以改善许多设置的性能。在本文中,我们使用Markov链中央限制定理将值分布建模为大约正常的。我们通过分析计算分位数,以提供一个新的DRL目标,该目标通过在情节过程中发生的标准偏差减少所告知。此外,我们还建议基于学习价值分布的近距离探索策略,类似于目标正态分布,以使价值更加准确以更好地改进策略。我们概述的方法与许多DRL结构兼容。我们使用近端政策优化作为测试台,并表明正常性引导的目标和勘探奖金都会改善绩效。我们演示了我们的方法在许多连续的控制任务上优于DRL基准。
translated by 谷歌翻译